Maintained by deepset

Integration: Weaviate

Use a Weaviate database with Haystack

Authors
deepset

Table of Contents

Haystack 2.0

PyPI - Version PyPI - Python Version test


Installation

Use pip to install Weaviate:

pip install weaviate-haystack

Usage

Once installed, initialize your Weaviate database to use it with Haystack 2.x.

In this example, we use the temporary embedded version for simplicity. To use a self-hosted Docker container or Weaviate Cloud Service, take a look at the docs.

from haystack_integrations.document_stores.weaviate import WeaviateDocumentStore
from weaviate.embedded import EmbeddedOptions

document_store = WeaviateDocumentStore(embedded_options=EmbeddedOptions())

Writing Documents to WeaviateDocumentStore

To write documents to WeaviateDocumentStore, create an indexing pipeline.

from haystack.components.file_converters import TextFileToDocument
from haystack.components.writers import DocumentWriter

indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"paths": file_paths}})

License

weaviate-haystack is distributed under the terms of the Apache-2.0 license.

Haystack 1.x

Haystack supports the use of Weaviate as data storage for LLM pipelines, with the WeaviateDocumentStore. You can choose to run Weaviate locally yourself, or use a hosted Weaviate database.

For details on the available methods and parameters of the WeaviateDocumentStore, check out the Haystack API Reference and Documentation

Installation

pip install farm-haystack[weaviate]

Usage

To use Weaviate as your data storage for your Haystack LLM pipelines, you should have it running locally or have a hosted instance. Then, you can initialize a WeaviateDocumentStore:

from haystack.document_stores import WeaviateDocumentStore

document_store = WeaviateDocumentStore(host='http://localhost",
                                       port=8080,
                                       embedding_dim=768)

Writing Documents to WeaviateDocumentStore

To write documents to your WeaviateDocumentStore, create an indexing pipeline, or use the write_documents() function. For this step, you may make use of the available FileConverters and PreProcessors, as well as other Integrations that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Weaviate database. The example pipeline below not only indexes the contents of the files, but also the embeddings. This way, we can do vector search on our files.

Indexing Pipeline

from haystack import Pipeline
from haystack.document_stores import WeaviateDocumentStore
from haystack.nodes import EmbeddingRetriever, MarkdownConverter, PreProcessor

document_store = WeaviateDocumentStore(host="http://localhost",
                                       port=8080,
                                       embedding_dim=768)
converter = MarkdownConverter()
preprocessor = PreProcessor()
retriever = EmbeddingRetriever(document_store = document_store,
                               embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")

indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
indexing_pipeline.add_node(component=retriever, name="Retriever", inputs=["PreProcessor"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["Retriever"])

indexing_pipeline.run(file_paths=["filename.pdf"])

Using Weaviate in a Query Pipeline

Once you have documents in your WeaviateDocumentStore, it’s ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of a custom prompt that, given a query, is designed to generate long answers based on the retrieved documents.

from haystack import Pipeline
from haystack.document_stores import WeaviateDocumentStore
from haystack.nodes import AnswerParser, EmbeddingRetriever, PromptNode, PromptTemplate

document_store = WeaviateDocumentStore(host='http://localhost",
                                       port=8080,
                                       embedding_dim=768)

retriever = EmbeddingRetriever(document_store = document_store,
                               embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
prompt_template = PromptTemplate(prompt = """"Given the provided Documents, answer the Query. Make your answer detailed and long\n
                                              Query: {query}\n
                                              Documents: {join(documents)}
                                              Answer:
                                          """,
                                          output_parser=AnswerParser())
prompt_node = PromptNode(model_name_or_path = "gpt-4",
                         api_key = "YOUR_OPENAI_KEY",
                         default_prompt_template = prompt_template)

query_pipeline = Pipeline()
query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])

query_pipeline.run(query = "What is Weaviate", params={"Retriever" : {"top_k": 5}})